Data Preload for Superscalar

نویسنده

Wen-mei W. Hwu

چکیده

decreased the average number of clock cycles per instruction. As a result, each execution cycle has become more signiicant t o o v erall system performance. To maximize the eeectiveness of each cycle, one must expose instruction-level parallelism and employ memory latency tolerant techniques. However, without special architecture support, a superscalar compiler cannot eeec-tively accomplish these two tasks in the presence of control and memory access dependences. Preloading is a class of architectural support which allows memory reads to be performed early in spite of potential violation of control and memory access dependences. With preload support, a superscalar compiler can perform more aggressive code reordering to provide increased tolerance of cache and memory access latencies and increasing instruction-level par-allelism. This thesis discusses the architectural features and compiler support required to eeectively utilize preload instructions to increase the overall system performance. The rst hardware support is preload register update, a data preload support for load scheduling to reduce rst-level cache hit latency. Preload register update keeps the load destination registers coherent when load instructions are moved past store instructions that reference the same location. With this addition, superscalar processors can more eeectively tolerate longer data access latencies. The second hardware support is memory connict buuer. Memory connict buuer extends preload register update support by allowing uses of the load to move a b o v e a m biguous stores. Correct program execution is maintained using the memory connict buuer and repair code iii provided by the compiler. With this addition, substantial speedup over an aggressive c o d e scheduling model is achieved for a set of control intensive nonnumerical programs. The last hardware support is preload buuer. Large data sets and slow memory subsystems result in unacceptable performance for numerical programs. Preload buuer allows performing loads early while eliminating problems with cache pollution and extended register live ranges. Adding the prestore buuer allows loads to be scheduled in the presence of ambiguous stores. Preload buuer support in addition to cache prefetching support is shown to achieve better performance than cache prefetching alone for a set of benchmarks. In all cases, preloading decreases the bus traac and reduces the miss rate when compared with no prefetching or cache prefetching. iv ACKNOWLEDGMENTS Discussions with Professor Wen-mei Hwu have always given me insight i n to the problems I am attempting to solve. He not only guided me through my research diiculties, …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Aggressive Schduling for Memory Accesses of CISC Superscalar Microprocessors

For CISC microprocessors, the proportion of memory access instructions is relatively high, and a specific address is likely to be accessed repeatedly in a short period of time because of register-to-memory or memory-to-memory instruction set architectures and limited register sets. As superscalar architectures advance, an aggressive scheduling policy for memory access becomes crucial. In this p...

متن کامل

Cached Data State Controller Enable

By exploiting ne grain parallelism, superscalar processors can potentially increase the performance of future super-computers. However, supercomputers typically have a long access delay to their rst level memory which can severely restrict the performance of superscalar processors. Compilers attempt to move load instructions far enough ahead to hide this latency. However, conventional movement ...

متن کامل

International Conference on Supercomputing . 1 Tolerating Data Access Latency with Register

متن کامل

Improving Database Performance on Simultaneous Multithreading Processors

Simultaneous multithreading (SMT) allows multiple threads to supply instructions to the instruction pipeline of a superscalar processor. Because threads share processor resources, an SMT system is inherently different from a multiprocessor system and, therefore, utilizing multiple threads on an SMT processor creates new challenges for database implementers. We investigate three thread-based tec...

متن کامل

Preload Effect on Nonlinear Dynamic Behavior of Aerodynamic Two-Lobe Journal Bearings

This paper presents the effect of preload on nonlinear dynamic behavior of a rigid rotor supported by two-lobe aerodynamic noncircular journal bearing. A finite element method is employed to solve the Reynolds equation in static and dynamical states and the dynamical equations are solved using Runge-Kutta method. To analyze the behavior of the rotor center in the horizontal and vertical directi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1993

Data Preload for Superscalar

نویسنده

چکیده

منابع مشابه

Aggressive Schduling for Memory Accesses of CISC Superscalar Microprocessors

Cached Data State Controller Enable

International Conference on Supercomputing . 1 Tolerating Data Access Latency with Register

Improving Database Performance on Simultaneous Multithreading Processors

Preload Effect on Nonlinear Dynamic Behavior of Aerodynamic Two-Lobe Journal Bearings

عنوان ژورنال:

اشتراک گذاری